Simple discriminant functions identify small sets of genes that distinguish cancer phenotype from normal.

نویسندگان

  • Gul S Dalgin
  • Charles DeLisi
چکیده

High-throughput gene expression profiling can identify sets of genes that are differentially expressed between different phenotypes. Discovering marker genes is particularly important in diagnosis of a cancer phenotype. However, gene sets produced to date are too large to be economically viable diagnostics. We use a hybrid decision tree-discriminant analysis to identify small sets of genes, i.e. single genes and gene pairs, which separate normal samples from different stages of tumor samples. Half the samples are selected for training to form the probability distribution of expression values of each gene. The distributions for the tumor and normal phenotypes are then used to classify the test samples. The algorithm also identifies gene pairs by combining the probability distributions to construct a decision tree which is used to determine the class of test samples. After a series of training and testing sessions, genes and gene pairs that classify all samples correctly are recorded. The method was applied to a breast cancer data; and classifier genes that distinguish normal breast from different stages of breast tumor were identified. The genes were ranked according to their minimum Euclidean distance between the expression values in tumor and normal samples. The algorithm was able to pick known cancer related genes but also find genes that were not identified as differentially expressed by t-test with a 2 fold cut-off. Overall, the method generates possible diagnostic genes and gene pairs for a specific disease phenotype to pursue further biological interpretations in cancer biology.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest

Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...

متن کامل

The application of discriminant analysis in differentiation of fibroadenoma and ductal carcinoma of breast tissue using ultrasound velocity measurement

Background: Ultrasound propagation velocity was measured experimentally in normal, fibroadenoma and ductal carcinoma breast tissues, in order to distinguish normal breast tissue from tumors. Materials and methods: In quantitative measurements of ultrasound velocity, 403 breast tissue images were selected, comprising 130 normal breast tissue, 130 fibroadenoma, and 143 ductal carcinoma tumors. Th...

متن کامل

Modeling Breast Acini in Tissue Culture for Detection of Malignant Phenotype Reversion to Non-Malignant Phenotype

Backgrounds: Evidence is accumulating to support disruption of tissue architecture as a powerful event in tumor formation. For the past four decades, intensive cancer research with the premise of “cancer as a cell based-disease” focused on finding oncogenes or tumor suppressor genes. However, the role of the tissue architecture was neglected. Three dimensional (3D) cell cultures which can recap...

متن کامل

On sparse Fisher discriminant method for microarray data analysis

One of the applications of the discriminant analysis on microarray data is to classify patient and normal samples based on gene expression values. The analysis is especially important in medical trials and diagnosis of cancer subtypes. The main contribution of this paper is to propose a simple Fisher-type discriminant method on gene selection in microarray data. In the new algorithm, we calcula...

متن کامل

Identification of Gene Biomarkers for Distinguishing Small-Cell Lung Cancer from Non-Small-Cell Lung Cancer Using a Network-Based Approach

Lung cancer consists of two main subtypes: small-cell lung cancer (SCLC) and non-small-cell lung cancer (NSCLC) that are classified according to their physiological phenotypes. In this study, we have developed a network-based approach to identify molecular biomarkers that can distinguish SCLC from NSCLC. By identifying positive and negative coexpression gene pairs in normal lung tissues, SCLC, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genome informatics. International Conference on Genome Informatics

دوره 16 1  شماره 

صفحات  -

تاریخ انتشار 2005